![]() ![]() ![]() ![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ARM Cores: Frequently Asked Questions |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Last updated 29th June 2000
This page contains some Frequently Asked Questions about the ARM Cores.
To find information on a particular subject, simply search within this page using your browser (e.g. Edit->Find in page). Alternatively use the search button at the top of this page to search the whole of this ARM web site - this will also find other references to your chosen subject in (for example) Data Sheets and Application Notes located elsewhere on this site. 1. Reading or storing data and instructions:
2. Testing the ARM cores:
3. Initialisation and Operation of the ARM core:
4. Corrections to ARM7TDMI data sheet (ARM DDI 0029E):
5. Interrupt behaviour:
6. Interfacing:
7. Synthesisable ARM cores:8. Cached ARM cores:
1. Reading or storing data and instructions:What does the ARM core read / write when using non aligned addresses?When an ARM instruction fetch takes place, the two least significant bits in the Address are undefined. Therefore an instruction fetch should always be considered to be aligned by the memory controller for the instruction to be fetched as expected, i.e. the memory system should ignore A[1:0] for an ARM instruction fetch. For Thumb instruction fetches, only A[0] should be ignored, because A[1] is now needed to indicate the half-word address.Unaligned accesses can only take place for data loads or stores. In this case, address bits [1:0] indicate which byte is addressed. Note: In general, the use of unaligned addresses should be avoided. Instead, the appropriate instructions should be used to load/store words (LDR/STR), half-words (LDRH/STRH) or bytes (LDRB/STRB). The behaviour of unaligned data loads and stores depends on the implementation ('unpredictable'). The following is a description of how the ARM7TDMI and ARM9TDMI cores (and the processors which include these cores) behave, but other cores may behave differently. Descriptions on how the ARM7 devices behave with unaligned data loads is described in the ARM7TDMI Data Sheet. For a general description please refer to the ARM Architecture Reference Manual. Example:
The following example shows the actions for a Little Endian system when the 2 least significant bits of the address are [10], i.e. not word aligned:
For unaligned data stores:
b) Half word store (
c) Byte store (
For unaligned data read:
b) half word read (
c) byte read (
In general:
Important: If a memory controller is being designed to abort unaligned data transfer, it is essential that the signal nOPC is used to prevent instruction fetches from being aborted. For further information, see the ARM7TDMI Data Sheet (ARM DDI 0029E), sections 4.9 and 4.10. There are also some Application Notes which address related topics:
4. Programmer's Model for Big-Endian ARM How does Little / Big Endian mode affect aligned / unaligned addressing?The Endian configuration of your system has no effect if you always read / write words. It is only important when half words or bytes are read / stored.The following table can be found in the ARM9TDMI Data Sheet (rev 1), section 3.6, and applies both to the ARM9TDMI and the ARM7TDMI (and processors containing these cores). It shows which bits of the data bus are read into the least significant bits of the destination register.
Example:
For further information, see the ARM7TDMI Data Sheet (ARM DDI 0029E), section 4.10.4 There are also some Application Notes which address related topics:
4. Programmer's Model for Big-Endian ARM How does the memory controller know whether the current access is aligned / non aligned word/ half word /byte?MAS[1:0] is used to indicate whether a word / half-word or byte access is to be performed, and is described on page 9-5 of the ARM7TDMI Data Sheet (ARM DDI 0029E). The signal has the following states:
Important: The memory system must be able to handle word, half word and byte writes! Memory systems supporting only word writes will have severe difficulties supporting C code because the compilers assume that the underlying access types of the ARM architecture are always available. Furthermore, it will not be possible to set software breakpoints in Thumb code using the EmbeddedICE Interface.
Instruction fetches: The memory controller should ignore A[0] for Thumb instruction fetches (nOPC=0 and MAS[1:0]=01), and A[1:0] for ARM instruction fetches (nOPC=0 and MAS[1:0]=10). Use of the uni-directional data buses in the ARM7TDMIThe signal BUSEN indicates the data bus configuration:
What is the difference between a von Neumann architecture and a Harvard architecture?A Harvard architecture has separate data and instruction busses, allowing transfers to be performed simultaneously on both busses. A von Neumann architecture has only one bus which is used for both data transfers and instruction fetches, and therefore data transfers and instruction fetches must be scheduled - they can not be performed at the same time.It is possible, and sometimes done, to have two separate memory systems for a Harvard architecture. As long as data and instructions can be fed in at the same time, then it doesn't matter whether it comes from a cache or memory. But there are problems with this. Compilers generally embed data (literal pools) within the code, and it is often also necessary to be able to write to the instruction memory space, for example in the case of self modifying code, or, if an ARM debugger is used, to set software breakpoints in memory. If there are two completely separate, isolated memory systems, this is not possible. There must be some kind of bridge between the memory systems to allow this. Using a simple, unified memory system together with a Harvard architecture is highly inefficient. Unless it is possible to feed data into both busses at the same time, it might be better to keep the design simple and stick to von Neumann architecture.
Use of cachesAt higher clock speeds, caches are useful as the memory speed is proportionally slower. Harvard architectures tend to be targeted at higher performance systems, and so caches are nearly always used in such systems.Von Neumann architectures usually have a single unified cache, which stores both instructions and data. The proportion of each in the cache is variable, which may be a good thing. It would in principle be possible to have separate instruction and data caches, storing data and instructions separately. This probably would not be very useful as it would only be possible to ever access one cache at a time. The ARM7xxT processor cores don't have separate instruction and data caches, but it is possible to define separate banks for instructions and data. Caches for Harvard architectures are very useful. Such a system would have separate caches for each bus. Trying to use a shared cache on a Harvard architecture would be very inefficient since then only one bus can be fed at a time. Having two caches means it is possible to feed both buses simultaneously....exactly what is necessary for a Harvard architecture. This also allows to have a very simple unified memory system, using the same address space for both instructions and data. This gets around the problem of literal pools and self modifying code. What it does mean, however, is that when starting with empty caches, it is necessary to fetch instructions and data from the single memory system, at the same time. Obviously, two memory accesses are needed therefore before the core has all the data needed. This performance will be no better than a von Neumann architecture. However, as the caches fill up, it is much more likely that the instruction or data value has already been cached, and so only one of the two has to be fetched from memory. The other can be supplied directly from the cache with no additional delay. The best performance is achieved when both instructions and data are supplied by the caches, with no need to access external memory at all. This is the most sensible compromise and the architecture used by ARMs Harvard processor cores. Two separate memory systems can perform better, but would be difficult to implement. 2. Testing the ARM cores:How do I drive other ARM7TDMI core input pins while using serialised test vectors via JTAG to test the core?In general, when using the serial JTAG vectors to test the ARM core, only the signals MCLK, TBE and nRESET need to be driven to special states, as follows:MCLK LOW TBE HIGH for ICE-Tests, depends on system designs for other tests nRESET HIGHImportant: You must also ensure that the external system is isolated from the ARM7TDMI during the serial test to remove the possibility of bi-directional signals clashing on the data bus. The document Serial Test Procedure explains the serial test procedure in detail. Can production test vectors be used to determine the maximum core speed of the ARM?The production test vectors are not designed for 'at speed' testing of the ARM cores. They are designed to give high fault coverage. The ARM cores sample inputs and change outputs on both the rising and falling clock edges. As the clock periods gets shorter, outputs start to change in the next cycle. This problem is made worse on a tester because there is the possibility of having a write followed by a read, causing contention on the data bus.Therefore it is not possible to use these vectors to carry out speed testing of the ARM cores. However it may be possible to reduce the cycle time and scale the test vectors to a higher frequency (to reduce test time, for example), but this will not be the actual maximum operating frequency of the ARM core. ARM recommends the core is characterised using ARM's pre-fab characterisation simulations in combination with measurements on a test chip using special characterisation patterns. Do the test vectors check the TAP controller ID code?The TAP controller IDCODE is checked only by the parallel scan vectors.The IDCODE is provided by 32 transistors with their sources tied to either Vdd or Gnd, and can be changed by altering one metal layer in the layout. If the IDCODE is changed by the ASIC designer, the netlist for layout verification as well as the test vectors must be modified to run without errors.
The length of the IDCODE is 32 bits, subdivided into 4 different fields:
Note that there is no parallel output from the ID register for the ARM7 family. The least significant bit of the register is scanned out first.
For the rev1 ARM7TDMI, the default IDCODE is 0x1F0F0F0F.
The ID code for the rev1 ARM9 family processors is configurable by the ASIC designer. What is the timing relationship between TDI/SDIN and TDO/SDOUTBS in the ARM processors which include the EmbeddedICE Logic?TDI is fed into a D-type Flip Flop, clocked by the rising edge of TCK. The output of this Flip Flop is presented on SDIN. If one of the user added scan chains is selected, SDOUTBS is fed asynchronously through from the user added scan chain to TDO.
How can the ARM core be tested?For the ARM7TDMI core, there are essentially 3 different approaches to testing:
1) Conventional parallel test:
2) Using serialised test vectors: The TAP controller in the ARM processor core is a standard IEEE1149.1 compliant implementation. However, the scan cells used in the ARM7 core are not fully compliant, because they do not have an 'update' stage. The scan cells in the ARM9 cores are fully IEEE1149.1 compliant, because they do have an 'update' stage.
3) Using AMBA: For ASIC designs using AMBA, this is the recommend approach. Comparison of production vectors.Here is a vector count comparison between AMBA (TIC), Parallel, and Serialised (access through JTAG) production test vectors:
* The Debug and ICE test have a high proportion of pre-serialised sections. How do I add scan chains to the ARM TAP controller?If the designer wants to test additional devices on the ASIC, either scan chain 3 (boundary scan) can be used, or additional scan chains can be added:
The ARM7TDMI uses scan chains 0-4, 8 for internal purposes. Additionally, scan
chain 15 is used by the system control coprocessor in the ARM710T / ARM720T.
Therefore, for the ARM7 family, scan chains 5-7 and 10-14 can be used by the
ASIC designer to test additional parts of the system. Note: The ARM TAP controller is IEEE compliant. It is recommended to follow the IEEE1149.1 specifications when adding scan chains to the TAP Controller. Latch-based scan cells similar to the following are used:
Latch A has two controlling inputs, SHCLKBS and ECAPCLKBS. It is effectively 2 latches, with a mux which selects whichever clock most recently changed. SHCLKBS and ECAPCLKBS are mutually exclusive signals. The signals the designer needs to use for the BS cells are:
If you are adding other scan chains (as opposed to just adding a boundary scan chain on scan-chain 3), you will also need to decode equivalent control signals from:
Note that the ARM7TDMI is a latch-based design, and it is assumed that any additional scan cells will be latch-based, too. Latch-based scan cells are discussed in the JTAG specification, IEEE1149.1 Appendix A. If a D-Types rather than a latch-based design is used, the designer probably might want to ignore SDINBS and use TDI instead (SDINBS is simply the output of a D-type with TDI at the input, clocked by TCK) The timing of the boundary scan control signals (when in EXTEST), is:
Note the extra pulse on SHCLK2BS. All ARM core models show the correct behaviour of the TAP Controller, so it is possible to determine the behaviour in more detail from simulations, if necessary. Some other inputs and outputs related to JTAG & TAP controller are also provided, but these are not strictly JTAG signals. They are provided to make it easier to re-use the TAP controller to add scan chains and to implement an external boundary scan chain for the ASIC.
Multiplexing of JTAG pins: 1) Have 2 completely separate JTAG ports (i.e. nTRST, TDI, TCK, TMS, TDO for each one = 10 pins). 2) Have 1 set of pins and a mux pin to decide which JTAG port is accessed at any particular time. This will still allows the ARM JTAG test vectors to work and allows EmbeddedICE Interface or Multi-ICE to be used for debugging. 6 pins are then required. 3) Daisy-chain the 2 JTAG ports together. This requires only 5 pins, but does mean only Multi-ICE can be used for debugging, and not the EmbeddedICE Interface. It also will require changes to be made to the ARM7TDMI serial vectors (if used). See Application Note 72, 'Multi-ICE System Design Considerations', in the ARM Application Notes. Additional reading:
ARM7TDMI Data Sheet (ARM DDI 0029E) 3. Initialisation and Operation of the ARM core:What might an initial configuration of the ARM7TDMI look like?
Reset after power upIt is good practice to reset a static device immediately on power-up, to remove any undefined conditions within the device which may otherwise combine to cause a DC path and thereby increase current consumption. Most systems are reset by using a simple RC circuit on the reset pin to remove the undefined states within devices whilst clocking the device.Note that nRESET must be held asserted for a minimum of two MCLK cycles to fully reset the core. It is necessary to reset the EmbeddedICE Logic and the TAP controller as well, regardless of whether debug features are used or not. This is done by taking nTRST LOW for at least Tbsr. During reset, the signals nMREQ and SEQ show internal cycles. After nRESET has been removed (i.e. taken HIGH), the ARM core does 2 further internal cycles before the first instruction is fetched from the reset vector (from 0x00). It then takes in total 3 MCLK cycles to advance this instruction through the fetch-decode-execute stages of the ARM instruction pipeline before this first instruction is executed, as shown in the diagram below.
How can the ARM banked registers be initialized?The way to initialize the banked registers for the different modes is to enter the specific mode and then to do the initialisation. At boot-up, these registers are indeterminate and thus should be initialized to some known value before they are used. In particular, the stack pointers for each mode (r13) should be initialized before use.Note that when using a model to describe the behaviour of the ARM core, the registers are initialized with the value 0xDEADDEAD. Example code can be found in the ROM subdirectory of the examples provided with the ARM Software Development Toolkit and the ARM Developer Suite. Is an internal (I) cycle always followed by a sequential (S) cycle?No, an I cycle will not always be followed by an S cycle. It can also be followed by a further I cycle, or by an N cycle. As pointed out in the ARM7TDMI / ARM9TDMI data sheets, during an I cycle, the ARM does not require any memory access. It will output the current PC value onto the address bus, as this is the most probable address it will require next. This allows the memory interface to see the address a cycle before it is required.As explained in the data sheet, this can be a benefit on some memory systems where a sequential address can be decoded (or the memory can be accessed) more quickly than a non-sequential address. As the address will remain the same between the I cycle and the S cycle, DRAM access can be started during the I cycle and then be completed in the S cycle, which may save 1 wait state. However, in the situation where there is an internal cycle which changes the value of the pc (r15), the address output during the I cycle will not be correct. So the cycle after this will be a N cycle, using the new pc address. An instruction like LDR pc,[r0] or LDMFD sp!,{r0-r12,pc} would cause this. This type of instruction is often used to return from subroutines or C function calls, so this is quite a common case. Here, the memory decoder will look at the address during the I cycle and start to decode it. It will then sample nMREQ and SEQ (as it would normally) and see that the next cycle will be a N cycle and so it must ignore the address which was on A[] during the I cycle. If you are designing a memory interface and are not using AMBA, see Application Note 29, Interfacing a memory system to the ARM7TDMI without using AMBA 4. Corrections to ARM7TDMI data sheet (ARM DDI 0029E):ARM7TDMI Signal Description (Section 2.1):Section 2.1 of the ARM7TDMI data sheet (ARM DDI 0029E) explains the signals for the ARM7TDMI. Unfortunately, the data sheet was not fully updated from silicon version rev0 to silicon version rev1, and in a few places the signal descriptions are inaccurate:
Name: Type: Description: --------------------------------------------------------------------------------- COMMRX 04 When HIGH, this signal denotes that the comms channel receive buffer is FULL. This signal changes on the rising edge of MCLK. CPA IC A Coprocessor which is capable of performing the operation that the ARM7TDMI is requesting (by asserting nCPI) should take CPA LOW immediately. If CPA is HIGH at the end of phase 1 of the cycle in which nCPI is LOW, the ARM7TDMI will abort the coprocessor handshake and take the undefined instruction trap, if CPB is HIGH as well. If no coprocessor is connected to the ARM7TDMI, both CPA and CPB have to be tied HIGH. DBE IC This is an input signal which, when driven LOW, puts the data bus D[31:0] into the high impedance state. It can be used for test or in shared bus systems. It should be held high to allow the ARM to output data. DBGEN IC This input signal allows the debug features of the ARM7TDMI to be disabled. The signal must be high to allow the EmbeddedICE Logic to be used. It should be driven low only when debugging will not be required. SHCLK2BS 04 ... SLCLK2BS is used to clock the slave half of the external scan cells. ... Software interrupt (Section 3.9.7):.... A SWI handler should return by executing the following instruction, irrespective of the state (ARM or Thumb):MOVS PC, LR ; LR is R14_svc.This restores the CPSR and returns to the instruction following the SWI. Reset (Section 3.11):When the nRESET signal goes LOW, ARM7TDMI abandons the executing instruction and then continues to fetch dummy instructions from incrementing word addresses with nMREQ=1 and SEQ=0 indicating internal cycles.Little Endian offset addressing (Section 4.9.3, Table 4.15):The second diagram in the table should read as follows:
The bidirectional data bus (Section 6.10.2, Table 6-3):In Section 6.10.2 of the ARM7TDMI data sheet Table 6-3 describes which signals can be tristated using ABE, DBE or TBE. In the second row of this table, one tick mark is missing: TBE tristates both A[31:0] and D[31:0].
ARM7TDMI Testchip data bus circuit (Section 6.10.3, Figure 6-16):In Section 6.10.3 of the ARM7TDMI data sheet an example is suggested to connect the ARM7TDMI to an external bus system.To reduce data out time, a new bus turnaround circuit is suggested. The suggested circuit is available as pdf file: Improved bus turnaround circuit Data are sampled into the ARM core on the falling edge of MCLK. nENOUT is an output signal from the ARM core, indicating a write to memory: During a data write cycle, nENOUT changes to LOW in phase 1 and stays LOW throughout phase 2 of the current clock cycle.
During a data write nENOUT is ORed with an external Data Bus Enable (optional) to generate nEN2, an active LOW enable to the output driver. At the same time, the input driver is disabled by using nENOUT ANDed with MCLK. This ensures that nEN1 is HIGH at all times during a data write, thus disabling the input driver. During a data read, nENOUT and therefore nEN2 is HIGH, thus disabling the output driver. During phase 2, both nENOUT and MCLK are HIGH, thus enabling the data bus input driver. Phase1 (MCLK=0) is used as a bus turnaround phase.
nENIN can now simply be tied low, to indicate to the ARM that it can drive data out as fast as possible. The requirement is then that the ARM doesn't drive the bus too fast, since this might turn ON the ARM bus drivers before nEN1 turns OFF (read followed by a write). For a write followed by a read there is no problem at all, since nEN2 will turn OFF at the start of the read cycle, but nEN1 will not turn ON until the end of the phase, when MCLK rises. The data out time is now from MCLK falling to internal nENOUT. The nENIN to 'Data bus driven' time has been removed. This gains around 8ns on the data out time.
Care must be taken during JTAG operation. System state determination (Section 8.10 / 8.11):Section 8.11.2 of the ARMTDMI data sheet (ARM DDI 0029E) explains how to determine the system status from debug state.Unfortunately, the data sheet was not fully updated from Silicon Revision 0 to Silicon Revision 1, and in a few places, it should read RESTART instead of BYPASS. For the Revision 0, the use of BYPASS was correct, but for Revision 1, the RESTART instruction was introduced, and must be used instead. In the following sections, RESTART should be written instead of BYPASS:
Section 8.10.1 Clock switch during debug "At this point, RESTART must be clocked into the TAP instruction register."
Section 8.10.2 Clock switch during test "On exit from test, RESTART must be selected as the TAP controller instruction. When this is done, MCLK can be allowed to resume. After INTEST testing, care should be taken to ensure that the core is in a sensible state before switching back. The safest way to do this is to either select RESTART and then cause a system reset or to insert MOV PC,#0 into the instruction pipeline before switching back."
Section 8.11.2 Determining system status "After the system speed instruction has been scanned into the data bus and clocked into the pipeline, the RESTART instruction must be loaded into the TAP controller." The instruction RESTART is described in section 8.8.10 of the ARM7TDMI Data Sheet (ARM DDI 0029E) Important for memory access at system speed: The BREAKPT bit is set in the instruction before the one that is to be executed at system speed. After a load or store instruction at system speed has been executed, debug state is re-entered. Leaving the debug state involves restoring the ARM7TDMI internal state, causing a branch to the next instruction to be executed, and synchronising back to MCLK. This means that the branch instruction will cause the pipeline to be flushed, and debug state will not be re-entered. Instead, the program will return to the address that was active at the time the core went into debug status, continuing with the execution of the program. 5. Interrupt behaviour:What happens if an interrupt occurs as it is being disabled?Description:If an interrupt occurs at the same time as the interrupt is disabled by the program, the ARM7 family may not behave as expected. For example, during the execution of a sequence such asMRS r0, cpsr ORR r0, r0, #I_Bit ;disable interrupts MSR cpsr_c, r0and an interrupt comes in during execution of the MSR instruction, then the behaviour will be as follows:
SUBS pc, lr, #4the SPSR_IRQ is restored to the CPSR. The CPSR will now have the I bit set and therefore execution will continue with interrupts disabled. However, in a number of cases this can cause problems for particular RTOS vendors in the following case: The RTOS has a single piece of dispatch code which is called by the interrupt routine and also by some regular code which has interrupts disabled. On exit from the dispatch code the code examines the I bit of the SPSR to determine whether it should perform an interrupt return or a regular return. In this case the dispatch code may become confused and think it has been called from some regular code, and will perform an incorrect return. Note: The same applies to FIQ interrupts! Workaround:The recommended workaround if your dispatch code does examine the disable bits in the SPSR is to add code similar to the following at the start of the interrupt routine.SUB lr, lr, #4 ; Adjust LR to point to return STMFD sp!, {..., lr} ; Get some free regs MRS lr, SPSR ; See if we got an interrupt while TST lr, #I_Bit ; interrupts were disabled. LDMNEFD sp!, {..., pc}^ ; If so, just return immediately. ; The interrupt will remain ; pending since we haven't ; acknowledged it and will be ; reissued when interrupts are next ; enabled. ... ; Rest of interrupt routineIf interrupt latency is critical, the test of the SPSR and return without acknowledging the interrupt should occur before the shared dispatch code is entered. What happens if an interrupt occurs as it is being enabled?Interrupts are enabled by clearing the I (for IRQ) or F (for FIQ) flags in the CPSR with an MSR instruction. If an interrupt occurs as it is being enabled, the instruction following the MSR instruction will still be executed.The reason is that the new flags are only available to the control logic at the end of the execution stage of the MSR instruction. The next instruction will have already been decoded and enters the execution stage of the instruction pipeline just as the flags are being changed. What are the timing requirements of interrupts entering the ARM core?Interrupts can be synchronous or asynchronous, depending on the 'ISYNC' pin on the core. If interrupts are asynchronous, they will be synchronised using the main processor clock (ECLK) before the interrupt is recognised.When an interrupt is recognised by the ARM core, the core will finish executing the instruction which is currently in the execution stage of the ARM instruction pipeline before starting the interrupt sequence. ARM has defined a standard programmers model of a interrupt controller, as part of our Reference Peripherals Specification. However, your hardware may not necessarily implement this. Are the IRQ & FIQ interrupts level-sensitive?Yes. The nIRQ and nFIQ inputs are active low, and level sensitive. They should be driven low and kept low until the interrupt service routine (interrupt handler) acknowledges the exception, then the interrupt request pin should be taken high again.The normal way this works is that the system will have some interrupt controller external to the ARM7TDMI, which takes the interrupt sources and drives the nIRQ pin, (or nFIQ). The interrupt service routine would then read a memory mapped register in the interrupt controller hardware, to find out which interrupt source was active. It would then write to the interrupt controller register to clear the interrupt (causing the nIRQ pin to be de-asserted) and in the case of a re-entrant interrupt handler, clear the CPSR 'I' bit. What happens inside the ARM core when an exception occurs?When an exception occurs, the following happens inside the core:
1) The CPSR is copied to the SPSR of the mode being entered. * There are two interrupt disable bits, one for FIQ, one for IRQ. When ANY exception occurs, the IRQ bit is set, to disable IRQ. If the exception was FIQ or Reset, then the FIQ disable bit is also set. The IRQ (or FIQ) handler should clear the source of the interrupt before re-enabling further IRQs. One must be very careful when re-enabling interrupts in your handler that you have taken the appropriate steps to allow for re-entrant IRQs (and FIQ). Chapter 9 of the SDT 2.50 User Guide provides a detailed discussion on how to write exception handlers.
See also the following entries elsewhere on this FAQ: What happens if an interrupt occurs and the interrupt handler does not remove the interrupt?Upon entry to the IRQ exception handler, the 'I' bit is set and further interrupts cannot be recognised by the core until the handler explicitly re-enables further interrupts by writing to the CPSR. As outlined in a previous entry: Are the IRQ (and FIQ) interrupts level-sensitive?, the IRQ handler should not do this until it has acknowledged the interrupt to whatever is driving the nIRQ input.ARM has defined a standard programmer's model of a interrupt controller, as part of our Reference Peripherals Specification, however, your hardware may not necessarily implement this. There is also some detailed information and example code in the SDT2.50 User Guide, section 9.5. Is there a priority scheme for exceptions?When multiple exceptions are valid at the same time (i.e. more than one exception occurs during execution of an instruction), they are handled by the core (after completing execution of the current instruction) according to the following priority scheme.
Reset The Undefined Instruction and SWI are both caused by an instruction entering the execution stage of the ARM instruction pipeline, so are mutually exclusive and cannot occur at the same time. Thus they have the same priority. Please note the difference between prioritization of exceptions (when multiple exceptions are valid at the same time), and the actual exception handler code. Exception handlers are themselves liable to interruption by exceptions, and so you must be careful that your exception handlers do not cause further exceptions. If they do, then you must take steps to avoid infinite "exception loops" whereby the link register gets corrupted and points to the entry point of the exception handler, thus giving you no way back to your application code. The following describes each exception individually.
1) Reset Reset is handled in Supervisor (SVC) mode. Note that one of the very first things that a reset handler should to is to set up the stacks of all the other modes, in case of an exception occuring. Note that an exception is not likely to occur in the first few instructions of the reset handler, and indeed no code should be here to provoke such an event, it would be uncommon to have a SWI or an Undefined instruction, nor a memory access, it is reasonable to assume that your reset handler has been hand crafted to map on to your system exactly so as to avoid any exceptions taking place during the handling of reset.
2) Data Abort
3) FIQ Similarly, when an FIQ is detected, the ARM core automatically disables further FIQs and IRQs (the F and I bits in the CPSR are set for the duration of the FIQ handler). This means that an FIQ handler will *not* be interrupted by another FIQ or an IRQ, unless you specifically re-enable FIQ or IRQ. For IRQ and FIQ, the default behaviour of the ARM core is to avoid nested (reentrant) interrupts.
4) IRQ *Please note that you must be very careful when re-enabling IRQs inside your IRQ handler. See section 9.5.2 of the SDT 2.50 Reference Guide for information. When an IRQ is detected, the ARM core automatically disables further IRQs (the I bit in the CPSR is set for the duration of the IRQ handler). This means that an IRQ handler will *not* be interrupted by another IRQ, unless you specifically re-enable IRQ.
5) Prefetch Abort
6) SWI
7) Undefined Instruction 6. Interfacing:Description of the Coprocessor interface of the ARM7TDMIThe following text briefly describes the coprocessor interface of the ARM7TDMI, and how a coprocessor should work.Note: If no coprocessor is connected to the ARM7TDMI, both CPA and CPB have to be tied HIGH. The coprocessor has to follow the pipeline of the ARM7TDMI. So it must have 3 stages (fetch, decode & execute), each holding one ARM instruction. The pipeline will advance each time the ARM does an instruction fetch, so the coprocessor pipeline stage will be controlled by (ECLK and NOT(nOPC)). At the decode stage of the pipeline, the coprocessor should examine the instruction opcode it has fetched. If it is a coprocessor instruction that it recognises, it must look to see if the nCPI ARM output goes low in the execution stage - if so, then the coprocessor instruction should be executed. If the coprocessor just follows D[31:0], sees a relevant coprocessor instruction and then just waits for nCPI, there may be problems. For example, if the next instruction executed is an LDM of all 16 registers, it would be necessary to wait 20 clock cycles before nCPI goes low. One could just let the coprocessor wait for 20 clock cycles, but this could cause problems. If, for example, a branch occurs before this coprocessor instruction was executed and the program runs an instruction for a different coprocessor, both coprocessors may try to execute it simultaneously. It is also necessary to consider the effect of interrupts/aborts occurring just after the coprocessor instruction appears on the D[31:0] bus. So, upon recognising a relevant instruction, one needs to count pulses of (ECLK and NOT (nOPC)) to count instruction pipeline advances. Only if nCPI goes low 2 pipeline advances after the coprocessor instruction was fetched should this instruction be executed. Besides looking at nOPC, it may be useful to consider TBIT. Coprocessor instructions are not possible in Thumb state, so in order to save power the pipeline follower in the coprocessor could be switched off during fetching of Thumb instructions. Once the coprocessor has recognised an instruction, it must drive CPA & CPB. When the ARM has a coprocessor instruction in its execution stage, it looks for CPA to go low. (If CPA is high, the undefined instruction trap is taken). If CPA is low, CPB is also checked. If CPB is high, the ARM will busy-wait until the coprocessor is ready to execute the instruction. During the busy-wait stage, the ARM will take an IRQ or FIQ if one occurs and the coprocessor instruction will be abandoned. This will be signaled to the coprocessor by nCPI going high. If CPB is low, the ARM will continue to fetch/execute subsequent instructions. Other things to consider are:
Timing of the coprocessor signals: The timing of nOPC is dependent upon how one controls APE/ALE (i.e. same timing as A[31:0] bus). If APE=ALE=1, then nOPC will change during the clock high phase of the cycle before the actual data transfer takes place. D[31:0] is valid on the falling edge of MCLK. nCPI changes off the falling edge of MCLK - the old value stays valid for time Tcpih and the propagation delay for the new value is given by the timing parameter Tcpi. The MCLK input to ECLK output propagation delay (Tcdel) also has to be taken into account. CPA & CPB are sampled on the MCLK rising edge. Of course, it may be possible to generate these signals during the previous MCLK high phase if the pipeline is being followed. They will be sampled on every MCLK rising edge - the setup and hold times (Tcps and Tcph) have to be met.
The above picture shows an ARM7TDMI executing a coprocessor MCR instruction:
Memory mapping hardware registers on word boundariesDescriptionIt is strongly recommended that ARM based designs align memory mapped hardware registers on word (32-bit) boundaries, rather than sub-word boundaries. The main reason for this is to make the hardware interface easier to implement.SolutionWhen reading 8 or 16 bit values from a peripheral the data must be presented on the correct byte lane. For example when the ARM reads an 8-bit register at address 0x2 it expects the data to be presented on data lines D[23:16]. This means that hardware is needed to route the data from the registers appropriately. If the ARM is in big endian mode than this will be different.If all memory-mapped registers are on word boundaries then the data can be presented on D[7:0] or D[15:0] and no byte lane steering hardware is required. Writes do not matter so much because the ARM will write a byte on all byte lanes or a half-word on both halves to make interfacing easier. In general, the hardware must ensure that 8, 16 and 32 bit accesses to memory and registers work correctly and that data is presented on the correct byte lane associated with the size and address of the transfer. See Application Note 61, "Big and Little Endian Byte Addressing" in the ARM Application Notes. 7. Synthesisable ARM cores:Differences between the ARM7TDMI-S and the ARM7TDMIDocumentation on the differences between the ARM7TDMI-S and the ARM7TDMI can be found in the ARM7TDMI-S Technical Reference Manual in Appendix B.8. Cached ARM cores:Which cached cores are available and what do they include?The following shows the naming conventions for the cached ARM cores, and a list of cores that are currently available.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() ![]() |